Biswas, Jacobs: an Efficient Algorithm for Learning Distances
نویسندگان
چکیده
Semi-supervised clustering improves performance using constraints that indicate if two images belong to the same category or not. Success depends on how effectively these constraints can be propagated to the unsupervised data. Many algorithms use these constraints to learn Euclidean distances in a vector space. However, distances between images are often computed using classifiers or combinatorial algorithms that make distance learning difficult. In such a setting, we propose to use the triangle inequality to propagate constraints to unsupervised data. First, we formulate distance learning as a metric nearness problem where a brute-force Quadratic Program (QP) is used to modify the distances such that the total change in distances is minimized but the final distances obey the triangle inequality. Then we propose a much faster version of the QP that enforces only a subset of the inequalities and can be applied to real world clustering datasets. We show experimentally that this efficient QP produces stronger clustering results on face, leaf and video image datasets, outperforming state-of-the-art methods for constrained clustering. To gain insight into the effectiveness of this algorithm, we analyze a special case of the semi-supervised clustering problem, and show that the subset of constraints that we sample still preserves key properties of the distances that would be produced by enforcing all constraints.
منابع مشابه
Active subclustering
Although there are many excellent clustering algorithms, effective clustering remains very challenging for large datasets that contain many classes. Image clustering presents further problems because automatically computed image distances are often noisy. We address these challenges in two ways. First, we propose a new algorithm to cluster a subset of the images only (we call this subclustering...
متن کاملAn efficient algorithm for finding the semi-obnoxious $(k,l)$-core of a tree
In this paper we study finding the $(k,l)$-core problem on a tree which the vertices have positive or negative weights. Let $T=(V,E)$ be a tree. The $(k,l)$-core of $T$ is a subtree with at most $k$ leaves and with a diameter of at most $l$ which the sum of the weighted distances from all vertices to this subtree is minimized. We show that, when the sum of the weights of vertices is negative, t...
متن کاملAn Efficient Algorithm for Learning Distances that Obey the Triangle Inequality
Semi-supervised clustering of images has been an interesting problem for machine learning and computer vision researchers for decades. Pairwise constrained clustering is a popular paradigm for semi supervision that uses knowledge about whether two images belong to the same category (must-link constraint) or not (can’t-link constraint). Performance of constrained clustering algorithms can be imp...
متن کاملAn Efficient Hybrid Metaheuristic for Capacitated p-Median Problem
Capacitated p-median problem (CPMP) is a well-known facility-location problem, in which p capacitated facility points are selected to satisfy n demand points in such a way that the total assigned demand to each facility does not exceed its capacity. Minimizing the total sum of distances between each demand point and its nearest facility point is the objective of the problem. Developing an effic...
متن کاملUsing an Evaluator Fixed Structure Learning Automata in Sampling of Social Networks
Social networks are streaming, diverse and include a wide range of edges so that continuously evolves over time and formed by the activities among users (such as tweets, emails, etc.), where each activity among its users, adds an edge to the network graph. Despite their popularities, the dynamicity and large size of most social networks make it difficult or impossible to study the entire networ...
متن کامل